Improving Multi-Document Summarization via Text Classification
نویسندگان
چکیده
Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSum projects documents onto distributed representations which act as a bridge between text classification and summarization. It also utilizes the classification results to produce summaries of different styles. Extensive experiments on DUC generic multidocument summarization datasets show that, TCSum can achieve the state-of-the-art performance without using any hand-crafted features and has the capability to catch the variations of summary styles with respect to different text categories.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملSingle Document Keyphrase Extraction Using Label Information
Keyphrases have found wide ranging application in NLP and IR tasks such as document summarization, indexing, labeling, clustering and classification. In this paper we pose the problem of extracting label specific keyphrases from a document which has document level metadata associated with it namely labels or tags (i.e. multi-labeled document). Unlike other, supervised or unsupervised, methods f...
متن کاملSupervised and Unsupervised Text Classification via Generic Summarization
This paper presents a new generic text summarization method using Non-negative Matrix Factorization (NMF) to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of NMF topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state of the art...
متن کاملA new graph based text segmentation using Wikipedia for automatic text summarization
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کامل